Skip to content

Fix flaky NRTReplicationEngineTests segment count assertions#21044

Open
andrross wants to merge 1 commit intoopensearch-project:mainfrom
andrross:fix/nrt-replication-engine-flaky-tests
Open

Fix flaky NRTReplicationEngineTests segment count assertions#21044
andrross wants to merge 1 commit intoopensearch-project:mainfrom
andrross:fix/nrt-replication-engine-flaky-tests

Conversation

@andrross
Copy link
Copy Markdown
Member

The testAcquireLastIndexCommit and
testGetSegmentInfosSnapshotPreservesFilesUntilRelease tests used generateHistoryOnReplica which randomly picks operation types and document IDs. With certain seeds, this produced DELETE/NO_OP operations or duplicate IDs with out-of-order seqNos, resulting in fewer segments than expected. The shared engine's random merge policy could also merge segments during refresh.

Replace generateHistoryOnReplica with deterministic INDEX operations using distinct document IDs. Use a local primary engine with ForceMergePolicy to prevent automatic merges while still allowing the explicit forceMerge the tests require.

This test would fail with the following seed:

./gradlew ':server:test' --tests 'org.opensearch.index.engine.NRTReplicationEngineTests.testGetSegmentInfosSnapshotPreservesFilesUntilRelease' -Dtests.seed=8AC50870EC095B74

Related Issues

Resolves #15817

Check List

  • Functionality includes testing.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

The testAcquireLastIndexCommit and
testGetSegmentInfosSnapshotPreservesFilesUntilRelease tests used
generateHistoryOnReplica which randomly picks operation types and
document IDs. With certain seeds, this produced DELETE/NO_OP
operations or duplicate IDs with out-of-order seqNos, resulting
in fewer segments than expected. The shared engine's random merge
policy could also merge segments during refresh.

Replace generateHistoryOnReplica with deterministic INDEX
operations using distinct document IDs. Use a local primary
engine with ForceMergePolicy to prevent automatic merges while
still allowing the explicit forceMerge the tests require.

Signed-off-by: Andrew Ross <andrross@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 12049a9: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

autocut flaky-test Random test failure that succeeds on second run skip-changelog >test-failure Test failure from CI, local build, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AUTOCUT] Gradle Check Flaky Test Report for NRTReplicationEngineTests

1 participant